Skip to content

[SPARK-56060][PS] Handle pandas 3 null string conversion in describe() for empty timestamp frames#54893

Closed
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-56060/describe
Closed

[SPARK-56060][PS] Handle pandas 3 null string conversion in describe() for empty timestamp frames#54893
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-56060/describe

Conversation

@ueshin
Copy link
Copy Markdown
Member

@ueshin ueshin commented Mar 18, 2026

What changes were proposed in this pull request?

This PR updates pandas-on-Spark DataFrame.describe() and the related test_describe_empty expectations for empty timestamp-containing frames to handle the pandas 3 astype(str) behavior change on null values.

In pandas 2, empty timestamp stats were string-converted as "None" in the relevant describe() path. In pandas 3, astype(str) preserves those empty stats as missing values instead. This patch updates the pandas-on-Spark result construction and the corresponding test expectations to follow that behavior consistently.

Why are the changes needed?

pyspark.pandas.tests.computation.test_describe FrameDescribeTests.test_describe_empty fails with pandas 3 because pandas changed how astype(str) handles null values in empty timestamp describe() results.

Without this change, pandas-on-Spark and the pandas-based expectation disagree for empty timestamp-only and mixed timestamp frames.

Does this PR introduce any user-facing change?

Yes.

For pandas-on-Spark DataFrame.describe() on empty timestamp-containing frames, null timestamp stats now follow the pandas 3 string-conversion behavior instead of always being materialized as "None".

How was this patch tested?

Ran the related pyspark.pandas.tests.computation.test_describe tests in both pandas 2 and pandas 3 Python environments.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenAI Codex (GPT-5)

@ueshin
Copy link
Copy Markdown
Member Author

ueshin commented Mar 18, 2026

@HyukjinKwon
Copy link
Copy Markdown
Member

Merged to master.

terana pushed a commit to terana/spark that referenced this pull request Mar 23, 2026
…) for empty timestamp frames

### What changes were proposed in this pull request?

This PR updates pandas-on-Spark `DataFrame.describe()` and the related `test_describe_empty` expectations for empty timestamp-containing frames to handle the pandas 3 `astype(str)` behavior change on null values.

In pandas 2, empty timestamp stats were string-converted as `"None"` in the relevant `describe()` path. In pandas 3, `astype(str)` preserves those empty stats as missing values instead. This patch updates the pandas-on-Spark result construction and the corresponding test expectations to follow that behavior consistently.

### Why are the changes needed?

`pyspark.pandas.tests.computation.test_describe FrameDescribeTests.test_describe_empty` fails with pandas 3 because pandas changed how `astype(str)` handles null values in empty timestamp `describe()` results.

Without this change, pandas-on-Spark and the pandas-based expectation disagree for empty timestamp-only and mixed timestamp frames.

### Does this PR introduce _any_ user-facing change?

Yes.

For pandas-on-Spark `DataFrame.describe()` on empty timestamp-containing frames, null timestamp stats now follow the pandas 3 string-conversion behavior instead of always being materialized as `"None"`.

### How was this patch tested?

Ran the related `pyspark.pandas.tests.computation.test_describe` tests in both pandas 2 and pandas 3 Python environments.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenAI Codex (GPT-5)

Closes apache#54893 from ueshin/issues/SPARK-56060/describe.

Authored-by: Takuya Ueshin <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants